Exploration of Contextual Constraints for Character Pre-Classification
نویسندگان
چکیده
We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, shape codes, and withinword context. On an ASCII test corpus of 925 articles that simulates perfect image-level processing, these methods achieve a substantial improvement over default assignment of all characters to lower case.
منابع مشابه
Named Entity Recognition Using a Character-based Probabilistic Approach
We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps word...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملCombining Lightly-Supervised Text Classification Models for Accurate Contextual Advertising
In this paper we propose a lightlysupervised framework to rapidly build text classifiers for contextual advertising. In contextual advertising, advertisers often want to target to a specific class of webpages most relevant to their product, which may not be covered by a pre-trained classifier. Moreover, the advertisers are only interested in the target class. Therefore, it is more suitable to m...
متن کاملDeformed Systems for Contextual Text Recognition
A fuzzy method for incorporating the contextual constraints into a text recognition system is presented here. The method takes as input all the internal result that an Isolated Character Classifier (ICC) computes for an input letter, instead of an unique output character. The internal result is handled here as a fuzzy set which is then processed by a Deformed System. Such a Deformed System repr...
متن کاملAn Improved Pre-classification Method for Off- line Handwritten Chinese Character Using Four Corner Feature
Pre-classification can effectively improve the performance of handwritten Chinese character recognition. This paper presents a method that uses four corner feature for pre-classification of handwritten Chinese characters. Considering writing variations, we define a set of basic stroke structures and match them with the structures in four corner regions of character image. The matching result wi...
متن کامل